AITopics | apprentice policy

Collaborating Authors

apprentice policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8ba6c657b03fc7c8dd4dff8e45defcd2-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 07:26:10 GMT

algorithm, apprentice policy, molecule, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

8ba6c657b03fc7c8dd4dff8e45defcd2-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 07:25:58 GMT

algorithm, gegl, molecule, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.31)

Add feedback

Thinking Fast and Slow with Deep Learning and Tree Search

Thomas Anthony, Zheng Tian, David Barber

Neural Information Processing SystemsNov-21-2025, 13:08:05 GMT

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning

Bajgar, Ondrej, Gould, Dewi S. W., Liu, Jonathon, Abate, Alessandro, Gatsis, Konstantinos, Osborne, Michael A.

arXiv.org Artificial IntelligenceSep-22-2025

As AI systems become increasingly autonomous, reliably aligning their decision-making with human preferences is essential. Inverse reinforcement learning (IRL) offers a promising approach to infer preferences from demonstrations. These preferences can then be used to produce an apprentice policy that performs well on the demonstrated task. However, in domains like autonomous driving or robotics, where errors can have serious consequences, we need not just good average performance but reliable policies with formal guarantees -- yet obtaining sufficient human demonstrations for reliability guarantees can be costly. Active IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration. We introduce PAC-EIG, an information-theoretic acquisition function that directly targets probably-approximately-correct (PAC) guarantees for the learned policy -- providing the first such theoretical guarantee for active IRL with noisy expert demonstrations. Our method maximises information gain about the regret of the apprentice policy, efficiently identifying states requiring further demonstration. We also present Reward-EIG as an alternative when learning the reward itself is the primary objective. Focusing on finite state-action spaces, we prove convergence bounds, illustrate failure modes of prior heuristic methods, and demonstrate our method's advantages experimentally.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.03693

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)

Genre:

Research Report (1.00)
Workflow (0.93)

Industry: Transportation (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

8ba6c657b03fc7c8dd4dff8e45defcd2-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 01:36:39 GMT

algorithm, apprentice policy, molecule, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

our responses to the comments. 4 Response to R1

Neural Information Processing SystemsAug-15-2025, 01:36:27 GMT

We sincerely thank all reviewers for their valuable efforts and insightful comments. We thank R1 for the helpful comment. Following R1's insightful suggestion, we compared GEGL with an additional "ablation" We thank R1 for the opportunity to make the following clarifications. We thank R2 and R3 for mentioning an important point. R2's comment: the current literature fails to search for a molecule that is high-scoring and realistic simultaneously.

algorithm, gegl, molecule, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.31)

Add feedback

Thinking Fast and Slow with Deep Learning and Tree Search

Thomas Anthony, Zheng Tian, David Barber

Neural Information Processing SystemsOct-4-2024, 07:26:36 GMT

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans.

algorithm, iteration, learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Guiding Deep Molecular Optimization with Genetic Exploration

Ahn, Sungsoo, Kim, Junsu, Lee, Hankook, Shin, Jinwoo

arXiv.org Machine LearningOct-27-2020

De novo molecular design attempts to search over the chemical space for molecules with the desired property. Recently, deep learning has gained considerable attention as a promising approach to solve the problem. In this paper, we propose genetic expert-guided learning (GEGL), a simple yet novel framework for training a deep neural network (DNN) to generate highly-rewarding molecules. Our main idea is to design a "genetic expert improvement" procedure, which generates high-quality targets for imitation learning of the DNN. Extensive experiments show that GEGL significantly improves over state-of-the-art methods. For example, GEGL manages to solve the penalized octanol-water partition coefficient optimization with a score of 31.40, while the best-known score in the literature is 27.22. Besides, for the GuacaMol benchmark with 20 tasks, our method achieves the highest score for 19 tasks, in comparison with state-of-the-art methods, and newly obtains the perfect score for three tasks. Our training code is available at https://github.com/

algorithm, molecule, optimization, (14 more...)

arXiv.org Machine Learning

2007.04897

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > Promising Solution (0.74)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Thinking Fast and Slow with Deep Learning and Tree Search

Anthony, Thomas, Tian, Zheng, Barber, David

Neural Information Processing SystemsDec-31-2017

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex1.0, the most recent Olympiad Champion player to be publicly released.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback